A Classification Tree Approach to Automatic Segmentation of Japanese Compound Sentences

نویسندگان

  • Yujie Zhang
  • Kazuhiko Ozeki
چکیده

It is well known that direct parsing of a long Japanese compound sentence is extremely difficult. Various pre-processing methods have been proposed to segment such a sentence into shorter, simpler ones prior to parsing. The problem with the conventional methods is that some kind of segmentation patterns or heuristic preference scores must be given manually, hence no guarantee for optimality. This paper proposes a new method of sentence segmentation based on a classification tree technique. In this method, optimal segmentation patterns and the optimal order of their application are automatically acquired from training data, linguistic phenomena together with their occurence frequencies being taken into account. Generation of a classification tree is conducted on an EDR corpus, and evaluation results are reported. It is shown that pruning reduces the tree size by a factor of about 1/4 without affecting the performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Plant Classification in Images of Natural Scenes Using Segmentations Fusion

This paper presents a novel approach to automatic classifying and identifying of tree leaves using image segmentation fusion. With the development of mobile devices and remote access, automatic plant identification in images taken in natural scenes has received much attention. Image segmentation plays a key role in most plant identification methods, especially in complex background images. Wher...

متن کامل

Object-Based Classification of UltraCamD Imagery for Identification of Tree Species in the Mixed Planted Forest

This study is a contribution to assess the high resolution digital aerial imagery for semi-automatic analysis of tree species identification. To maximize the benefit of such data, the object-based classification was conducted in a mixed forest plantation. Two subsets of an UltraCam D image were geometrically corrected using aero-triangulation method. Some appropriate transformations were perfor...

متن کامل

Automatic road crack detection and classification using image processing techniques, machine learning and integrated models in urban areas: A novel image binarization technique

The quality of the road pavement has always been one of the major concerns for governments around the world. Cracks in the asphalt are one of the most common road tensions that generally threaten the safety of roads and highways. In recent years, automated inspection methods such as image and video processing have been considered due to the high cost and error of manual metho...

متن کامل

A Lexicalized Tree Adjoining Grammar for Thai

This paper describes an alternative formalism for Thai syntax parsing based on a lexicalized tree adjoining grammar (LTAG). We first briefly present some formal background concerning LTAG, which is necessary for an understanding of LTAG and its application to Thai. Specifically, we address several issues regarding difficulties in parsing Thai sentences and how to resolve these issues using LTAG...

متن کامل

Automatic Bunsetsu Segmentation of Japanese Sentences Using a Classification Tree

Bunsetsu, which is comprised of a content word followed by, possibly 0, function words, is a convenient unit for dependency structure analysis of Japanese. There are, however, no spaces indicating bunsetsu boundaries in the orthographic writing of Japanese. Thus a sentence must be segmented into bunsetsu's by some means prior to dependency structure analysis. Conventionally, such segmentation h...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999